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Abstract. Continuous-time dynamics for learning and evolution in games are 
first order systems where payoffs are used to determine the growth rate of the 
players' strategy shares. In this paper, we examine what happens beyond this 
first order framework by viewing payoffs as higher order forces, specifying e.g. 
the acceleration of the players' evolution. To that end, we derive a wide class of 
higher order game dynamics, generalizing all first order imitative dynamics, and, 
in particular, the replicator dynamics. 

In stark contrast to the first order case, we show that weakly dominated strategies 
become eliminated in all n-th order payoff-monotonic dynamics for n > 2; moreover, 
strictly dominated strategies become extinct in n-th order dynamics n orders as 
fast as in first order. Finally, we also establish a higher order analogue of the 
folk theorem of evolutionary game theory; remarkably, as a consequence of this 
higher order mode of learning, the rate of convergence of n-th order dynamics to 
strict equilibria turns out to be n orders as fast as in their first order counterparts. 



1. Introduction 

Owing to the considerable complexity of computing Nash equilibria and other 
rationalizable outcomes in non-cooperative games, a fundamental question that 
arises is whether these outcomes may be regarded as the result of a simple dy- 
namic process where "the participants are supposed to accumulate empirical in- 
formation on the relative advantages of the various pure strategies at their dis- 
posal" (Nash, 1950, p. 21). To that end, numerous classes of game dynamics have 
been proposed (from both a learning and an evolutionary "mass-action" perspec- 
tive), each with its own distinct set of traits and characteristics - see e.g. the 
comprehensive survey by Sandholm (2011) for a most recent account. 

Be that as it may there are very few rationality properties that are shared by a 
decisive majority of classes of game dynamics, even if we focus for simplicity on 
the continuous-time, deterministic regime. For instance, a simple comparison be- 
tween the well-known Smith dynamics (Smith, 1984) and the replicator dynamics 
(Taylor and Jonker, 1978) reveals that game dynamics can be innovative (Smith) 
or imitative (replicator); strictly dominated strategies might survive (Hofbauer 
and Sandholm, 2011) or, on the contrary become extinct (Samuelson and Zhang, 
1992); rest points might coincide with the Nash set of the game (Hofbauer and 
Sandholm, 2009) or properly contain it instead; etc. On the other hand, negative 
results seem to be much more ubiquitous: Hart and Mas-Colell (2003) showed 
that there is no class of uncoupled game dynamics that always converges to equi- 
librium, and even worse, weakly dominated strategies may survive in the long 
run, even in simple 2x2 games (Samuelson, 1993; Weibull, 1995). 

As one might expect, the view is not particularly more uniform from a mathe- 
matical standpoint either: perhaps the single unifying feature of the vast majority 
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of (deterministic, continuous-time) game dynamics is that they are first order dy- 
namical systems that evolve over a product of simplices (the game's mixed strat- 
egy space). Beyond this basic attribute however, game dynamics hardly share any 
other resemblances: even continuity (an otherwise standard regularity assump- 
tion) is absent in several important classes of dynamics - e.g. as in the projection 
dynamics of Sandholm, Dokumaci, and Lahkar (2008). 

Quite interestingly, in the closely related field of optimization (which can be 
likened to playing against nature), this restriction to first order dynamics is not 
present. To wit, as was recently shown by Alvarez, Attouch et al. (Alvarez, 2000; 
Alvarez, Attouch, Bolte, and Redont, 2002; Attouch, Goudou, and Redont, 2000), 
second order gradient ascent (the so-called "heavy ball with friction" method) has 
some remarkable optimization properties that first order schemes do not possess. 
Indeed, by interpreting the gradient of the function to be maximized as a physical, 
Newtonian force (and not as a first order vector field to be dutifully tracked by the 
system's trajectories), in many instances one can give the system enough energy 
to escape the basins of attraction of local maxima and converge instead to the 
global maximum of the objective function (something which is not possible in 
ordinary first order dynamics). 

This, therefore, begs the question: can second (or higher) order dynamics be in- 
troduced in a game theoretic setting? And if yes, can we by so doing obtain better 
convergence results and/or escape the impossibility results of first order dynamics? 

The first key challenge to overcome in this endeavor is that game dynamics 
must first and foremost respect the constraints of the game's strategy space. To 
circumvent this impasse, Flam and Morgan (2004) proposed a heavy-ball method 
as in Attouch et al. (2000) above, and they exogenously enforced consistence by 
projecting the orbits' velocity to a subspace of admissible directions when the 
updating would lead to inadmissible strategy profiles (say, assigning negative 
probabilities). Unfortunately, as is often the case with projection-based schemes 
(see e.g. Sandholm et al., 2008), the resulting dynamics are not continuous, so 
even basic existence and uniqueness results are hard to obtain. 

On the other hand, if players try to improve their performance by aggregat- 
ing information on the relative payoff differences of their pure strategies, then 
this cumulative empirical data is not constrained (as mixed strategies are). Thus, 
a promising way to obtain a well-behaved second (or higher) order dynamical 
system for learning in games is to use the player's accumulated data to define 
an unconstrained performance measure for each strategy (this is where the dynam- 
ics of the process come in), and then map these "scores" to mixed strategies 
by means of e.g. an (inverse) logit choice model (Hofbauer, Sorin, and Viossat, 
2009; Mertikopoulos and Moustakas, 2010; Rustichini, 1999; Sorin, 2009). In other 
words, the dynamics can first be specified on an unconstrained space, and then 
reflected on the game's actual strategy space via the players' choice model (which 
produces a mixed strategy based on each pure strategy's score). 

Outline of Results. After a few preliminaries in Section 2, this approach is made 
precise in Section 3 where we present a higher order framework for the familiar 
class of imitative dynamics (Bjornerstedt and Weibull, 1996), a class containing all 
payoff-monotonic dynamics, and in particular, the replicator dynamics. In fact, 
as a consequence of the passage from performance scores to mixed strategies, 



HIGHER ORDER GAME DYNAMICS 



3 



the resulting dynamics naturally inherit a game-independent "adjustment" term 
which slows down the orbits that approach the boundary of the game's strategy 
space and renders the latter invariant. 

Regarding the rationality properties of the derived dynamics, we show in Sec- 
tion 4 that payoff-monotonic dynamics of any order eliminate strictly dominated 
strategies, including iteratively dominated ones: in the long run, only rationaliz- 
able strategies can survive. Qualitatively, this result is the same as its first order 
analogue; quantitatively however, the rate of extinction increases dramatically 
with the order of the dynamics considered: dominated strategies become extinct in 
n-th order dynamics n orders as fast as in their first order counterparts (Theorem 4.1). 
The reason for this enhanced rate of elimination is that empirical data accrues 
much faster if a higher order scheme is used rather than a lower order one: 1 play- 
ers who look deeper into the past by using a higher order learning rule identify 
consistent payoff differences much faster, so they annihilate dominated strategies 
much faster as well. 

A remarkable consequence of the above is that in all higher order (n > 2) 
payoff-monotonic dynamics, even weakly dominated strategies become extinct (Theo- 
rem 4.4). Needless to say, this comes in stark contrast to the first order setting 
where weakly dominated strategies survive even in simple 2x2 games such as 
Entry Deterrence (Weibull, 1995, Ex. 5.4). As such, the implementation of a higher 
order learning rule carries significant ramifications for the justification of ratio- 
nal behavior: the elimination of weakly dominated strategies can be interpreted 
as the outcome of a learning process, simply by considering more sophisticated 
players who look deeper into the past. 

Extending our analysis to equilibrium play we show in Section 5 that the folk 
theorem of evolutionary game theory (Hofbauer and Sigmund, 1988; Weibull, 
1995) continues to hold in our setting as well (modulo certain technical modifi- 
cations needed to accommodate higher order dynamics). More specifically, in all 
higher order payoff-monotonic dynamics we show that: a) if an interior solution 
orbit converges, then its limit is Nash; b) if a point is Lyapunov stable, then it is 
also Nash; and c) if players start close enough to a strict equilibrium and with a 
small enough higher order learning bias, then they converge to it (Theorem 5.1). 
In fact, echoing our results on the rate of extinction of dominated strategies, we 
show that n-th order payoff-monotonic dynamics converge to strict equilibria n 
orders as fast as their first-order counterparts. 

As a converse to (c) in first order dynamics, it is well-known that the flow of the 
multi-population replicator dynamics is "incompressible" (volume-preserving), 
so its orbits cannot coalesce to an interior point (Hofbauer, 1996; Hofbauer and 
Sigmund, 1988; Ritzberger and Vogelsberger, 1990); as a result, a point is asymp- 
totically stable if and only if it is a strict equilibrium (Ritzberger and Weibull, 
1995). That said, more general dynamics are not incompressible, so this impor- 
tant equivalence ceases to hold: for instance, even the payoff-adjusted replicator 
dynamics of Maynard Smith exhibit interior attractors in simple 2x2 games (see 
e.g. Ex. 5.3 in Weibull, 1995). 

On the other hand, given that payoffs do not depend on the strategies' growth 
rates (or other derivates), incompressibility turns out to be an intrinsic prop- 
erty of all imitative higher order dynamics for n > 2, so non-pure points cannot 
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be attracting in any higher order imitative dynamics (Theorem 5.4). Higher order 
learning is thus seen to reinforce the link between the "epistemic" (as Ritzberger 
and Weibull (1995) and van Damme (1987) call it) and the dynamic instability of 
mixed equilibria: if players ascribe to a higher order learning scheme, then even lossless 
off-equilibrium deviations become inherently unstable. 

2. Notation and Preliminaries 

2.1. Notational conventions. Given a finite set S = { s «}a = 0' tne vec ^ or s V ace 
spanned by § will be the set of formal linear combinations of elements of S with 
real coefficients, i.e. the set M s = Hom(S,IR) of all maps x: § — > M. Clearly the 
space R s admits a canonical basis {e a }a=o consisting of the indicator functions 
e a : S — s> ]R which take the value e K (s a ) = 1 on s a and vanish otherwise. Hence, 
under the natural identification s a t— > e a , we will make no distinction between the 
elements s a of § and the corresponding basis vectors e a of 1R S ; moreover, we will 
frequently use the index cc to refer to either s a or e&, and we will identify the set 
A(S) of probability measures on S with the standard n-dimensional simplex of 
R s : A(S) = {x £ R s : £ a x a = 1 and x« > 0}. 

In a similar vein, if {§k}ke3C is a family of finite sets S/ c indexed by k £ %, 
we will also use the shorthand Ym ror the sum £ ae §,. Finally, regarding players 
and their actions, we will follow the original convention of Nash and employ 
Latin indices (j, k, . . . ) for players, while keeping Greek ones («, j8, . . . ) for their 
actions (pure strategies); also, unless otherwise mentioned, we will use a, f>, . . ., 
for indices that start at 0, and fi,V,. . ., for those which start at 1. 

2.2. Finite games. For our purposes, & finite game in normal form will consist of a 
finite set of players N — {1, . . . , N}, each with a finite set of actions (or pure strate- 
gies) Ak = {a/c,0/ &k,l> • ■ • } that can be mixed by means of a probability distribution 
(or mixed strategy) xj- = (x^o x k,l> •••) € A(-^fc)- The set A(Ak) of a player's mixed 
strategies will be denoted by Xj-, and aggregating over all players, the space of 
strategy profiles x = (x\, . . . , xjv) £ Ylk ^ wm be the product X = Ylk X^; in this 
way, if A — IJj. Ak denotes the (disjoint) union of the players' action sets, X can 
be seen as a product of simplices embedded in = Ylk ^ Ak - 

As is customary, when we wish to focus on the strategy of a particular (focal) 
player k £ N versus that of his opponents 3Sf_^ = we will employ the short- 

hand (xj.;x_j-) = (xx,...,X) c ,...,Xn) £ X to denote the strategy profile where 
player k plays xj. £ Xj- against his opponents' strategy x_j- £ X_^ = Yle^k ^-t- The 
players' (expected) rewards are then prescribed by the game's payoff (or utility) 
functions X — >■ R: 

"*(*) = E ai - • • M ^( a i' ■ ■ ■ ' a N) X\ M ■ ■ ■ x Nan , (2.1) 

where Uk(&i, ■ ■ ■ / «n) denotes the reward of player k in the profile {ot\, ... , «n) £ 
rifc-^/c/ specifically, if player k plays a £ Ak, we will use the notation: 

u ka {x) = u k {a;x_ k ) = . . . ,ol, . . . ,x N ). (2.2) 

In light of the above, a game in normal form with players k £ N, action sets Ak, 
and payoff functions u^ \ X — >■ E. will be denoted by 25 = <&(N,A,u). A subgame of 
6 will then be a game 25' = <8' (Jf,A' ,u') played by the players of 6, each with 
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a subset A k C Afc of their original actions, and with payoff functions u' k = i/jtlx' 
suitably restricted to the reduced strategy space X' = Ylk A(-AJt)- 

Now, given a game = <8(N,A,u), we will say that the pure strategy a £ ^ 
is (strictly) dominated by /J £ A k (and we will write a -< jS) when 

u kui x ) < u kf>i x ) ror an strategy profiles x £ X. (2.3) 

More generally, we will say that £ X k is dominated by qi £ X;. if 

u k( a k^ x -k) < u k( a k> x -k) for all strategies x_,t £ X_jt of fc's opponents. 

(2.4) 

Finally, if the above inequalities are only strict for some (but not all) x £ X, then 
we will employ the term weakly dominated and write ^ q' k instead. 

By removing dominated (and, thus, rationally unjustifiable) strategies from a 
game, other strategies might become dominated in the resulting subgame, lead- 
ing to the notion of iteratively dominated strategies. Specifically, given two subsets 
M k and M_ k of X k and X_ k respectively, let 

]ust(M k ,M-k) = ilk G M k : Vq' k £ M k ,3q_ k £ M_ k s.t. u k {q k ;q_ k ) > u k {q' k ;q_ k )} 

be the set of strategies q k £ M k that are justifiable (i.e. not dominated) with re- 
spect to any strategy q_ k £ M_ k . Then, starting with X k = X k , define inductively 
the set of strategies that survive r elimination rounds as XT = Just(X[~ 1 ,X^. 1 ) 
where X r ~^ = Yli^k similarly, the pure strategies that survive after r rounds 

will be denoted by AT = A k n X[. In this way, the sequence {X[}^ forms a 
descending chain X^ D X^ D . . . whose limit X~ = fl^lo -^fc consists of those 
strategies of player k that are rationalizable, i.e. they survive all rounds of elim- 
ination. In particular, if the space X°° = Ylk or rationalizable strategies is a 
singleton, © will be called dominance-solvable and the sole surviving strategy in 
X°° will be the game's rational solution. 

Assume now that play evolves over time, say along the path x(t) £ X, t > 0. 
In that case, we will say that a pure strategy a £ A k becomes extinct along x(t) 
if x ka (t) — > as t — > oo; more generally, for mixed strategies q k £ X k , we will 
follow Samuelson and Zhang (1992) and say that q k becomes extinct along x(t) 
if min{x ka (t) : a £ supp(qj-)} — > 0, with the minimum taken over the support 
supp(<?/c) = £ A k : q k p > 0} of q k . 

The above is equivalent to asking that the quantity V k (x) = Yla£ S upp(q k ) { x kaf k " 
vanish as t — » oo. Modulo a constant and a change of sign, the logarithm of this 
quantity is known as the Kullback-Leibler divergence Dkl(<?/c \\ x k) or x k with respect 
to q k ; more precisely, we have: 

E>kl07*|I**) = E felog(feAte) • (2-5) 
aesuppfe) 

Clearly, Dkl(^)c || *jfc) blows up to +oo whenever minjxj-n. : a £ supp(q/ c )} — > 0, so 
q k £ Xj- becomes extinct along x(t) iff Dkl(^ || x k(t)) ~ > 00 as t — >■ oo. 

On the other hand, when a game cannot be solved by removing dominated 
strategies, one typically turns to the celebrated solution concept of a Nash equilib- 
rium which characterizes profiles that are resilient against unilateral deviations. 
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We will thus say that q £ X is a(t) Nash equilibrium when 

u ki x k'' a -k) < u k{ a ) ror au x k £ Xfc and for all A: e INT, (2.6) 

and if (2.6) is strict for all x/ c £ X k \{^}, fc £ N, <j will be called itself strict. Finally, 
equilibria of subgames ©' of ®, will be called restricted equilibria of 0. 

2.3. Dynamical systems. In this section, our aim will be to give some back- 
ground on dynamical systems and, in particular, to highlight the distinction be- 
tween a system's configuration space and its phase space, two notions which can be 
used interchangeably in a first order environment, but which are very different 
in higher orders. 

Recall first that the set of tangent vectors to the strategy space X of a normal 
form game at some point x £ X admits the structure of a pointed convex cone, 
known as the (solid) tangent cone to X at x. As can be easily seen, this cone is 
naturally isomorphic to the convex cone of all rays that emanate from x £ X and 
which intersect X in at least one other point; more precisely: 

T X X = jz £ K A : Y* z ka = for all k £ N and z u > if x ka = j . (2.7) 

Following Lee (2003), a well-posed (first order) dynamical system on X may 
be regarded as a flow on X, i.e. as a smooth map 0: X x R + — > X which satisfies 
the conditions i.) 0(x,O) = x for all x £ X, and ii.) 0(0(x, f),s) = &(x, t + s) for 
all x £ X and for all s, t > 0. The curve & x : R+ -> X, t i-> 0(x, f) will be called 
the orWf (or trajectory) of x under 0, and when there is no danger of confusion, 
x (f) will be denoted more simply as x(t). Moreover, if x(0) denotes the initial 
velocity of the trajectory x(t), we see that induces a vector field V on X via the 
mapping x i— >■ V(x) = x(0) £ T X X, so, by the fundamental theorem on flows, 
will also be the unique flow such that x(f) = V(x(t)) for all t > 0. 

Given these equivalent descriptions of a dynamical system, we have the fol- 
lowing definitions for stationarity and stability of a point q £ X: 

• q will be called stationary if V(q) = (i.e. if &1(t) = q for all t). 

• q will be called Lyapunov stable if, for every neighborhood U of q, there 
exists a neighborhood V of q such that x(t) £ LI for all x £ V , t > 0. 

• q will be called attracting if x(f) — > q for all x in a neighborhood U of ^. 

• q will be called asymptotically stable if it is Lyapunov stable and attracting. 

The differential equation x(£) = V(x(t)) (or x = V for short) represents a first 

order dynamical system on X. Higher order dynamicss of the form "x^ n > = V" 
can then be defined by means of the equivalent recursive formulation: 

x(f) = x J (f) 

x 1 {t)=x 2 {t) (2.8) 

± n - 1 (t) = V(x(t),x\t) x n -\t)). 

In this way, an n-th order dynamical system on X may be seen as a flow on 
the phase space Q = Q(X) = LLXT^X)" -1 whose points (n-tuples of the form 
(x, x 1 , . . . , x"^ 1 ) as above) represent all possible states of the system. 2 By contrast, 

More concisely, a state can be seen as an n-jet, with the phase space fi being the n-th order jet 
bundle of X (Saunders, 1989). 
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base points x £ X will keep the designation "points", and X itself will be called 
the configuration space of the system (a distinction which, as has already been 
noted, is redundant in first order systems). 

Of course, the evolution of an n-th order dynamical system is not uniquely 
determined by its initial position x(0) in the system's configuration space X, but 
by the system's initial state (x(0),x(0), . . . ,3e( n_1 ) (0)) in the phase space CI. On 
that account, if we wish to study how initial positions evolve over time, we will 
consider the corresponding rest states (x(0), 0, . . . , 0) which signify that the system 
starts at rest; furthermore, we will also use this natural embedding to view X as 
a subset of CI and write X £ O instead of (x, 0, . . . , 0) 6 fi. 



3. Derivation of Higher Order Dynamics 

3.1. Possible approaches and their limitations. Needless to say, the most funda- 
mental requirement for any class of game dynamics is that its solution trajectories 
must actually stay within the game's strategy space X (viewed as a subset of R" 4 ). 
Accordingly, if we consider a general first order dynamical system of the form: 

Xkx = h K (.x), (3-1) 

with Lipschitz F ka : 'R A — »■ R, the consistency postulate above is equivalent to 
the tangency requirements a) J^a F ka = ror ai l & £ N, and b) F ka > whenever 
x ka = - cf. the definition (2.7) of T X X and the relevant discussions in Hofbauer 
and Sigmund (1988), Sandholm (2011) and Weibull (1995). 

On the other hand, the situation becomes much more intricate if we attempt 
to introduce second (or higher) order dynamics on X. To wit, if we sensibly (but, 
ultimately, naively) replace x with x in (3.1), then the resulting dynamics will 
not respect the simplicial structure of X, no matter the choice of F: if players 
start with a sufficiently high initial growth rate vector x(0) pointing towards the 
exterior of X, then the corresponding solution orbits will escape X in finite time, 
possibly never to return. 3 

One way around this obstacle was proposed by Flam and Morgan (2004) who 
forced solutions to remain in X by exogenously projecting the velocity v{t) = x(t) 
of an orbit to the tangent cone T X X of "admissible" velocity vectors - similarly 
to Nagurney and Zhang (1997) and Sandholm et al. (2008). In this context, one 
begins with the "naive" second order version of (3.1): 

x ka = v ka (3.2a) 

Vk« = F fa*> (3-2b) 
and replaces (3.2a) with the "projected" variant: 

x = pro) TxX (v), (3.2a') 



From a physical viewpoint, this behavior is entirely natural: after all, finite forces cannot contain 
particles of arbitrarily high energy in a bounded region. To prove this rigorously, simply use Taylor's 
theorem to write x ka (t) = x kcl (0) + x kct (0)t + F ks (x(g k[t ))t 2 for some ^ kx 6 [0,f], and then choose x ka (Q) 
sufficiently negative (recalling that F is bounded) so that x kx (t) becomes negative for sufficiently small 
f > 0. 
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where proj T ^ x (c) is the projection of v to the tangent cone T X X of X at x. 4,5 

This approach has the benefit that x(t) £ X for all f for which x(t) is defined, 
but it also carries some considerable (and, to a large extent, unavoidable) weak- 
nesses. First, the projection operator proj T X does not vary continuously with x (a 
consequence of the changing structure of T X X along the faces of X), so existence 
and (especially) uniqueness of solutions to (3.2a') might (and does) fail. Second, 
in order for players to update (3.2a'), they must be able to tell precisely when 
they hit a boundary face of X in order to change the projection operator that 
they are using (a typical downside to non-interior methods). However, given that 
numerical calculations always carry unavoidable approximation errors (such as 
the "machine e" of the players' calculating device (Cantrell, 2000)), this updating 
scheme is prone to numerical instabilities which might lead solution trajectories 
to escape X in practical implementations. 

An alternative approach to keep solution orbits in X would be to erect an 
"infinite wall" at the faces of X, as in the "potential well" problems of quantum 
mechanics (Sakurai, 1994). These "walls" take the form of smooth barriers of 
infinite height that are placed at the faces of X, and which are such that solution 
trajectories cannot jump over them as they approach bd(X). Specifically, this 
amounts to modifying the forces (3.2b) by setting: 

%cc = hcc + Wka, (3.2b') 

where the "boundary terms" W kci = W kcc (x, v) satisfy: 

(1) Ym ^ka = 0/ m order to be consistent with the constraint Ym x ka = 1- 

(2) W koc (x,v) — > +oo as x ka — > 0, in order to ensure that v ka — > +oo as x ka — > 
(implying in turn that x ka (t) cannot vanish if x koi (Q) > initially). 

Nonetheless, this approach still suffers from important drawbacks. To begin 
with, condition (2) above implies that x(t) will rebound at bd(X), meaning that 
(3.2b') can never converge to a pure strategy profile (or other boundary point of 
X). To remedy this, we would need to impose the further condition: 

(3) W ka (x,v) — > as v ka — > 0, in order to allow for finite accelerations v koc 
when x(t) approaches bd(X) with vanishing velocity. 

However, this raises new questions regarding the limit behavior of W kK (x, v) as 
both x ka — > and v ka — »■ 0, leading to the following, more important issue: if we 
simply impose an arbitrary barrier term on (3.2) satisfying the above conditions, 
then the adjusted dynamics (3.2a)-(3.2b') certainly do not emerge naturally from 
game-theoretic considerations. Therefore, unless one can exhibit an inherent link 
with rationality or learning, this approach remains an artificial device with the 
sole, self-serving purpose of trapping solutions within X. 

3.2. The second order replicator dynamics in dyadic games. An alternative ap- 
proach with its roots in reinforcement learning is to overcome the restrictions 
imposed by the simplicial structure of X by having players update an uncon- 
strained measure of their actions' performance rather than directly updating their 



Flam and Morgan actually consider only the "gradient field" Ffc, = Ufa, but the same projection 
machinery can be applied to more general Lipschitz vector fields as well. 

^Strictly speaking, by using (3.2a') instead of (3.2a), the dynamics (3.2a') are not a bona fide 
second order system in the sense of (2.8) except in the interior int(X) of X; this technicality will not 
concern us here, but it is nonetheless important to keep in mind. 
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constrained strategies (as was the case in the above approaches). In this uncon- 
strained space of performance measures, second (or higher) order effects come 
about naturally when players look two (or more) steps into the past: it is the 
dynamics of these "scores" that induce a well-behaved dynamical system on the 
game's strategy space. 

As an introductory example, let us consider a game where every player k £ N 
has two possible actions, say "0" and "1", which are ranked over time using 
the associated payoff differences u k j — u k0 ■ With this regret-like information at 
hand, and assuming perfect information, players measure the performance of 
their strategies by updating the auxiliary score variables (or propensities): 

U k (t + l) = U k (t)+Au k (x(t)) r (3.3) 

where x k (t) = x k i(t) represents the mixed strategy of player k at time f (assumed 
discrete for the moment). The strategies x k are then updated themselves following 
the well-known inverse logit (or expit) choice model whereby actions that score 
better are played exponentially more often (Hofbauer et al., 2009; Mertikopoulos 
and Moustakas, 2010; Rustichini, 1999; Sorin, 2009): 6 

, t( , + 1) = expl»( Ut (, + 1 ))^ T |gW+ia_ , ,3.4, 

This process is repeated indefinitely, so if we descend to continuous time for 
simplicity 7 the system of (3.3) and (3.4) yields the coupled equations: 

U k = Au k (x) (3.5a) 

x k = expit(l4) = (1 + exp(-L4)) _1 . (3.5b) 

Hence, by differentiating (3.5b) in order to decouple it from (3.5a), we readily 
obtain the 2-strategy replicator dynamics of Taylor and Jonker (1978): 

= ItT^ = x k (l - x k ) Au k {x). (3.6) 
dU k 

In this well-known derivation of the replicator dynamics from the exponential 
reinforcement rule (3.5), the constraints x k £ [0,1], k £ INT, are automatically 
satisfied thanks to (3.4). On the downside however, (3.5a) itself "forgets" a lot 
of past (and potentially useful) information because the corresponding "discrete- 
time" recursion (3.3) only looks one step in the past. To remedy this, players 
could take (3.3) one step further by aggregating the scores U k themselves so as to 
build even more momentum towards the strategies that tend to perform better. 
We thus obtain the second order cumulative reinforcement scheme: 

U k (t + 1) = U k (t)+Au k (x{t)) (3.7a) 
Y k {t + l)=Y k (t) + U k {t + l), (3.7b) 



We are using the term "inverse" because the expit function in (3.4) is actually the inverse logit 
function: U k = log(x k ) — log(l — x k ) = logit (**), so x k = logit _1 (U)fc) = expit(U )c ). 

n 

We should stress here that this passage to continuous time is done at a heuristic level - see 
Rustichini (1999) and Sorin (2009) for an exploration of some of the issues that arise in the passage 
from the discrete to the continuous. The discrete version of the exponential updating rule (3.5) is 
a very important topic to address, but since we seek to focus on the properties of the underlying 
continuous-time dynamics, it lies beyond the scope of this paper. 
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where, as before, the profile x(i) is updated following the logistic distribution 
(3.4). 8 Then, by eliminating the intermediate aggregation variables U k from (3.7), 
we obtain the second order recursion: 

Y k (t + 1) - Y k (t) = Y k (t) - Y k (t - 1) + Au k (x(t)), (3.8) 

which, in turn, leads to the continuous-time variant: 

% = Au k . (3.9) 

As we have already seen in the first order case, the coupled system of equations 
(3.4) and (3.9) automatically respects the simplicial structure of X by virtue of 
the logistic updating rule (3.4), so the important hurdle of actually staying in X 
has (finally!) been overcome. Nonetheless, for the purposes of comparison with 
the previous suggested approaches, it will be quite instructive to also derive the 
dynamics governing the evolution of the strategy profile x(t) itself. 

To that end, note that (3.4) gives Y k — logit(x^) = log(xj-) — log(l — x k ), so a 
simple differentiation yields: 

% = ^ + ^ = *" y (3.10) 

x k l-x k x k (l-x k ) 
Then, differentiating yet again, we obtain: 

^ _ x k x k (l - x k ) - 4(1 - *k) + x\x k ^ n . 

xl(l-x k ) 2 

and some algebra leads to the second order replicator dynamics for dyadic games: 

*k = x k {l - x k )Au k + ZXk ±1 (3.12) 
x k {l - x k ) 

This derivation of a second order dynamical system on X will be the archetype 
for the significantly more general class of higher order dynamics of the next 
section, so we will not pause here to discuss (3.12) in any length. That said, 
it is worthy to note the following: 

Remark 3.1 (Boundary terms). The second order system (3.12) is precisely of the 
form (3.2b'), with the last term of (3.12) playing the role of the "infinite wall" 
which blows up as x k — »■ and vanishes as v k — > (thus keeping x(t) from escap- 
ing X, but allowing it to converge to bd(X) with zero velocity). However, whereas 
these same conditions were imposed artificially on (3.2b') and were devoid of any 
links to learning or evolution, they now emerge naturally, as the byproduct of a 
logit choice model where players look deeper into the past. 

Remark 3.2 (Past information). The precise sense of "looking deeper into the past" 
in the double aggregation scheme (3.7)-(3.9) can be made clearer if we write out 
the first and second order scores U k and Y k as explicit functions of time; in the 
continuous case, this gives: 

14(f) = f Au k (x(s))ds, (3.13a) 
Jo 

Y k (t)= [ U k (s)ds= [ (t - s) Au k (x(s)) ds. (3.13b) 
Jo Jo 

o 

Of course, players could look even deeper into the past by taking further aggregates in (3.7), but 
we will not deal with this issue here in order to keep our example as simple as possible. 
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We thus see that the first order aggregate scores U k assign uniform weight to all 
past instances of play, while the second order aggregates Y k put (linearly) more 
weight on instances that are further removed into the past. This mode of weighing 
can be interpreted as players being reluctant to forget what has occurred, and this 
is precisely the reason that we describe the second order scheme (3.9) as "looking 
farther into the past". 

It should be noted that in the theory of learning, past information is usually 
discounted (e.g. by an exponential factor) and ultimately discarded in favor of 
more recent observations (Fudenberg and Levine, 1998). As we shall see, "refresh- 
ing" observations in this way results in the players' propensity scores U k growing 
at most linearly in time (see e.g. Hofbauer et al., 2009, Rustichini, 1999, and Sorin, 
2009); by contrast, if players reinforce past observations by using (3.13b) in place 
of (3.13a), then their propensities may grow quadratically instead of linearly. As 
a result, first order learning is more conservative (leaning towards "exploring" 
more than "exploiting"), whereas higher order learning is less tempered and cor- 
responds to more decisive players. 

3.3. Reinforcement learning and higher order dynamics. In the general case, 
the reinforcement learning setup that we will be working with is as follows: 

(1) For every action a £ A k , player k keeps and updates a score (or propensity) 
variable y ka £ K which measures the performance of a over time. 

(2) Players transform these scores into mixed strategies x k £ X k by means of 
the Gibbs (inverse logit) choice model G k : R" 4 * — > X k , y k h-> G k (y k ): 

r -C ex P( A ^(^)) ( C AA\ 

exp (AjtytpCO) 

where the "inverse temperature" A k > controls the model's sensitivity 
to external stimuli (Mertikopoulos and Moustakas, 2010; Sorin, 2009). 

(3) The game is played and players record the payoffs u ka {x) for each a. £ A k . 

(4) Players update their scores and the process is repeated ad infinitum. 

Needless to say, the focal point of this learning process is the exact way in 
which players update the performance scores y ka £ 1R at each iteration of the 
game. We will thus take the natural extension of the aggregating framework of 
the previous section and consider a reinforcement scheme in which the players' 
scores y kci are updated by looking n steps into the past as follows: 

Y n k ; 1 {t + l)=Y^\t)+u ka {x(t)) 

(3.14) 

*£,(*+!) = + *£■(*) 

yUt + l) = Yl{t + l) 

The above scheme might appear somewhat cryptic at first, but it is quite straight- 
forward to explain in words. In order to sharpen their performance measures as 
much as (consistently) possible, the players of the game accumulate data on each 
action's payoff; they then take an aggregate of this aggregate, and so forth up to 
n levels into the past; finally, they use this cumulative aggregate to update their 
mixed strategies via the Gibbs model (GM). 
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In particular, if we eliminate the intermediate aggregation variables Y 1 , Y 2 , 
etc., we obtain the straightforward n-th order recursion: 

A n y ka (t) = u ku (x(t)) r (3.15) 

where A n y kx (t) denotes the n-th order finite difference of y kcu defined inductively 
as A"y{t) = A n ~ l y(t + 1) - A"- 1 y(0, with A 1 y{t) = y(f + 1) - y(t). As such, if we 
again descend to continuous time for simplicity, we get the n-th order exponential 
learning dynamics: 

y l £(t) = u**(*(t)), (LEW 

where x(t) is given by the Gibbs map (GM). 

Clearly, the learning dynamics (LD n ) with Gibbs choice completely specify the 
evolution of the players' mixed strategy profile x(t); however, they still fall short 
of our original goal to establish higher order dynamics in the players' strategy 
space X itself. To that end, by extending the calculations of Section 3.2 to our 
current general setting (see Appendix A), we obtain the n-th order (asymmetric) 
replicator dynamics: 

X kJ = X k ( U k* ~ U k) ~ x k« [Ka 1 ~ D/3 x kP R kf}J ' ( RE) n) 

where the terms R/"^ 1 represent the effect of utilizing the higher order aggrega- 
tion scheme (LD n ) and are given by the (game-independent!) expression: 

(3.16) 

the sum being taken over all non-negative integers m\, . . . ,m n _\ such that m\ + 

h (n — 1) • m n _\ = n, and m = m\ ^ + m n _\. 

As one would expect, for n = 1, we trivially obtain R® a = for all a 6 A^, 
k E N, so (RD n ) is reduced to the standard replicator dynamics: 

*kcc = hXkx (Wfa*M - M x )) ■ ( RD l) 

As before, this derivation highlights the intimate link between the Gibbs distri- 
bution (GM) and the replicator equation: the latter is just a simple offshoot of the 
former combined with the learning dynamics (LD n ). 9 

On the other hand, for n = 2, the only lower order term that survives in (3.16) 
is for m\ = 2; a bit of algebra then yields the Newtonian replicator dynamics: 

*k a = x ka (u ka (x) - u k (x)) + x ka (x\J x 2 ka - £jg x\pl x k ^J . (RD 2 ) 

At first glance, the above equation seems different from the second order equation 
(3.12) that we derived in Section 3.2, but this is just a matter of reordering: if we 
restrict (RD 2 ) to two strategies, "0" and "1", and set x k = x k/ \ = 1 — x k , we will 
have x k = x k \ = —x k Q, and (3.12) follows immediately. 

All told, the second order updating scheme y ka = u ka that gives rise to (RD 2 ) 
highlights a very deep analogy between Newtonian mechanics and learning in 
games: in (RD 2 ), the game's payoffs can be interpreted as the actual physical forces 



%ee also Rustichini (1999) and the more recent discussion in Hofbauer et al. (2009). 
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that determine the system's evolution, exactly as they would prescribe the orbits of a real 
physical system. 

Higher order imitative dynamics and payoff monotonicity. The dynamics (RD n ) 
will comprise the core of our higher order considerations, much as the replicator 
dynamics have become the gold standard for evolutionary and learning dynam- 
ics. More generally however, players might not base the updating (LD n ) of their 
performance scores on the payoffs u ka (x) of the game, but on some different (as- 
sumed continuous) "payoff observables" w ka : X — >■ R. In that case, we obtain the 
generalized reinforcement scheme: 

Vko! = w k a { x )> (GLD n ) 
which, coupled with the Gibbs model (GM), yields the generalized dynamics: 

x kJ = X k x ka iPkcSA ~ w k {x)) - x^R'l' 1 - R n k ~ l ), (GD n ) 

where w k is the player average w k {x) = J^J x kcc zv ka (x) (and similarly for R k ). 

The dynamics (GD n ) are characterized by the important property of "imita- 
tion", i.e. that players will never assign positive probability to a pure strategy 
that has become extinct in the course of play: if x ka (0) = initially, then x ka (t) 
remains zero for all time (cf. Remark 3.3 below). As such, the dynamics (GD n ) can 
be seen as the higher order extension of the class of imitative dynamics consid- 
ered by Bjornerstedt and Weibull (1996) (see also Weibull, 1995, and Sandholm, 
2011): to obtain the higher order analogue of any first order dynamics of the 

form x ka = XfaiyVfa — w k ), one simply needs to replace x ka with x$ and add the 
(game-independent) boundary term X ka (R^7 — R 1 ^ 1 ). 

That said, if the observables w ka are completely uncorrelated to the game's 
payoff functions u ka , there is little hope that the dynamics (GD n ) will lead to 
any sort of meaningful, rational play over time. It is thus natural to focus on 
observables w which respect the payoff ranking of a player's strategies, i.e.: 

w ka{ x ) > w kfi{ x ) if and only if u ka (x) > u k p(x), (PM) 

for all a., j6 G .Afc, and for all ieX. This correlation condition is usually referred to 
in the literature as monotonicity (or payoff monotonicity) , so when the payoff observ- 
ables w ka satisfy (PM), the n-th order dynamics (GD n ) will be dubbed monotonic 
as well (see e.g. Hofbauer and Weibull, 1996, Samuelson and Zhang, 1992, and 
Weibull, 1995). 

More generally, the monotonicity requirement (PM) can be broadened by re- 
placing one (or both) of the pure strategies «, jS 6 A k by mixed ones (and possibly 
weakening the "if and only if" requirement to an "if"); in particular, (PM) can 
be viewed as a special case of the more stringent condition known as aggregate 
monotonicity (Samuelson and Zhang, 1992): 

H>jt(<&x_k) > w k (q k ;x^ k ) iff u k (q k ;x^ k ) > u k (q k ;x_ k ), (AM) 

with x_ k 6 X_ k and q k , q' k 6 X k . We will thus say that the dynamics (GD n ) are: 

• aggregate-monotonic if (AM) holds for all q k , q' k G X k . 

• convex-monotonic when the "if" direction of (AM) holds for all pure qL 

• concave-monotonic when the "if" direction of (AM) holds for all pure q k . 

• monotonic if (AM) holds for q k and q' k that are both pure. 
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(For a survey of these classes of dynamics in first order, see Hofbauer and 
Weibull, 1996, Weibull, 1995, or, for a most recent account, Viossat, 2011.) 

In light of the above, we close this section with two book-keeping remarks on 
the Gibbs choice model (GM) which apply to all dynamics of the form (GD n ): 

Remark 3.3 (The faces of X). The Gibbs model (GM) maps Wt A to the (relative) 
interior int(X) of X, so any initial score assignment corresponds to a strategy 
profile that assigns positive mass to all actions a E A for all time. For the most 
part, we will not need to consider non-interior orbits; nonetheless, if required, we 
can consider initial conditions on any face X' of X simply by restricting the Gibbs 
map (GM) to appropriate subsets of the players' action sets (i.e. by effectively set- 
ting a non-utilized action's score to — oo and working in the associated subgame). 
In this manner, we see that the interior of any face of X is invariant in (GD n ), just 
as in the first order case. 10 

Remark 3.4. We should also note that (GM) is not a 1-1 map between R^ and 
int(X), so we may not freely pass from strategies to scores: for any c £ R, we 
have G k (y k/ Q + c /J//c,i + c, . ..) = G k (y k Q,y k/ i, . . . ), so strategies can be mapped 
to scores only up to an additive constant (reminiscent of how the addition of a 
constant to a player's payoffs does not change the game). To recover a bijection, 
one may flag each player's "0"-th strategy and introduce the relative scores: 

z k } i = Vkft ~ Vk,o, F £A* k = A k \{0}, (3.17) 

which are then mapped to strategies via the reduced map Gt : K A t -> X k : 

G* k>0 (z) = (l + J* _1 , G* kp (z) = (l + El e^) _1 . (GM*) 

In view of (GM*), the relative scores z kjl may be recovered by the inverse expres- 
sions z kjl = log(x kfl / x k0 ) , u e A* k , teN (cf. (3.5b) in Section 3.2); furthermore, 
we can also identify strategy "0" of player k with the "point at negative infinity" 
(— 00, . . . , — 00) in R k, an observation which will be particularly useful in the 
asymptotic stability discussion of Section 5. 

4. Elimination of dominated strategies 

A fundamental rationality requirement for any class of game dynamics is that 
dominated strategies become extinct over time. Along these lines, our first result 
is that in the n-th order replicator dynamics, dominated strategies die out at a 
rate which is exponential in t n : 

Theorem 4.1. Let x(t) be an interior solution path of the n-th order replicator dynamics 
(RD n ). Ifq k EX k is iteratively dominated, we will have: 

Dkl(% II **(0) > hct n /n\ + 0(t"- 1 ), (4.1) 
for some constant c > 0; in other words, only rationalizable strategies survive. In partic- 
ular, for pure strategies a -< f>,we have: 

*ta(0/*k/}(0 < exp (-A k Aup K t n /n\ + 0{t n ~ 1 )) , (4.2) 
where Aup K = min T6X {w^(x) - u ka (x)} > 0. 



This is also the reason that we need not concern ourselves with the technicalities of the fact that 
the dynamics (GD n ) blow up near the boundary bd(X) of X. 
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(a) Portrait of a dominance solvable game. (b) Rate of extinction of dominated strategies. 

FIGURE 1. Extinction of dominated strategies in the first and second order 
replicator dynamics. In Fig. 1(a) we plot the second order solution orbits of a 
dominance solvable game (see labels for the payoffs). In Fig. 1(b), we show the 
rate of extinction of the dominated strategy "0" by plotting the K-L divergence 
of a typical trajectory: the K-L distance grows exp-quadratically in second order 
dynamics compared to exp-linearly in first order (Theorem 4.1). 

As an immediate corollary, we then obtain: 

Corollary 4.2. In dominance-solvable games, the n-th order replicator dynamics (RD n ) 
converge to the game's rational solution. 

The proof of Theorem 4.1 can be found in Appendix B; for now, we will focus 
on some relevant remarks: 

Remark 4.1 (The asymptotic extinction rate). Even though (4.1) and (4.2) have 
been stated as inequalities, one can use any upper bound for the game's payoffs 
to show that the rate of extinction of dominated strategies in terms of the K- 
L divergence really is 0(t"). 11 As a result, we see that the asymptotic rate of 
extinction of dominated strategies in the n-th order replicator dynamics (RD n ) is 
n orders as fast as in the standard first order dynamics (RDj), so irrational play 
becomes extinct much faster in higher orders. 

Remark 4.2 (Irrelevance of adversarial play). Interestingly, the proof of Theorem 
4.1 for dominated (but not iteratively dominated) strategies goes through un- 
scathed for any (continuous) play x_k(t) 6 X_/ c , t > 0, of k's opponents. As such, 
the extinction of dominated strategies is independent of how the focal player's 
opponents evolve over time - they need not even be rational. 

Remark 4.3 (Payoff-monotonic dynamics). In the first order case, Samuelson and 
Zhang (1992) showed that in all payoff-monotonic dynamics, pure dominated 
strategies become extinct along interior solution orbits, while the same holds for 
mixed dominated strategies in the class of aggregate-monotonic dynamics. This 



In fact, the coefficients that make (4.1) and (4.2) into asymptotic equalities can also be determined, 
but we will not bother with this calculation here. 
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result was extended by Hofbauer and Weibull (1996) to pure strategies which are 
dominated by mixed ones in convex-monotonic dynamics, whereas Viossat (2011) 
recently established the dual result for concave dynamics. 

As it turns out, the second order landscape mirrors the first order one except 
for an accelerated rate of extinction (see Appendix B for the proof): 

Theorem 4.3. For any interior initial condition, we have: 

• Payoff-monotonic n-th order dynamics eliminate all pure strategies that are domi- 

nated by pure strategies. 

• Convex (resp. concave) monotonic n-th order dynamics eliminate all pure (resp. 

mixed) strategies that are dominated by mixed (resp. pure) strategies. 

• Aggregate-monotonic n-th order dynamics eliminate all dominated strategies. 
Moreover, the rate of extinction is exponential in t n (in the sense of (4.1)). 

On the other hand, if a strategy is only weakly dominated, Theorem 4.1 cannot 
guarantee that it will be annihilated; in fact, it is well known that weakly domi- 
nated strategies do survive even in the standard first order replicator dynamics: 
if the pure strategy a £ A^ of player k is weakly dominated by f> £ A^, and if 
all adversarial strategies a_£ £ ,A_/ C against which /3 performs better than a die 
out, then a may survive for an open set of initial conditions (for instance, see 
Example 5.4 and Proposition 5.8 in Weibull, 1995). 

Quite remarkably, this can never be the case in a higher order setting if players 
start unbiased: 

Theorem 4.4. Let x(t) be an interior solution orbit of the n-th order (n > 2) replicator 
dynamics (RD n ) that starts at rest: x(0) = . . . = x(" _1 '(0) = 0. If % £ X/ c is weakly 
dominated, then it becomes extinct along x(t) with rate 

Dkl(?* II **(0) > hct n ~ l /(n - 1)!, (4.3) 
where Aj- is the learning rate of player k and c > is a positive constant. 

The intuition behind this surprising result (see Appendix B for the proof) can 
be gleaned by looking at the reinforcement learning scheme (LD n ). If we take 
the case n = 2 for simplicity, we see that the "payoff forces" F^ = will never 
point towards a weakly dominated strategy. As a result, solution trajectories 
are always accelerated away from weakly dominated strategies, and even if this 
acceleration vanishes in the long run, the trajectory still retains a growth rate 
velocity that drives it away from the dominated strategy By comparison, this 
is not the case in first order dynamics; there, we only know that growth rates 
point away from weakly dominated strategies, and if these rates vanish in the 
long run, solution trajectories might ultimately converge to a point where weakly 
dominated strategies are still present (see for instance Fig. 2). In light of this, 
some further remarks are in order: 

Remark 4.4. The assumption that solution orbits start at rest is simply there to 
ensure that players do not have an initial higher order "learning bias" in the form 
of uneven initial score derivatives y (0), y(0), ... 7^ that might unduly skew their 
learning scheme towards one strategy or another. 12 As such, starting "at rest" is 

12 Indeed, = for all r = 1, . . . ,n — 1, a e Ay, also implies = yff for all a, f> 6 Ay, so being 
at rest is tantamount to a lack of learning bias. 
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a very natural (and canonical) assumption to make: players may start with any 
mixed strategy they wish, but it is assumed that their learning process is not 
otherwise biased until their empirical data starts to accrue in (LD n ). 

Remark 4.5. As with strictly dominated strategies, Theorem 4.4 applies to more 
general imitative dynamics under the same caveats: aggregate-monotonic dynam- 
ics eliminate all weakly dominated strategies, convex (resp. concave) monotonic 
dynamics eliminate pure (resp. mixed) strategies that are weakly dominated by 
mixed (resp. pure) strategies, and payoff -monotonic dynamics eliminate pure 
strategies that are weakly dominated by other pure strategies. 

Remark 4.6. It is also important to note that our estimate of the rate of extinction 
of weakly dominated strategies is one order lower than that of strictly dominated 
ones; as a result, Theorem 4.4 does not imply the annihilation of weakly domi- 
nated strategies in first order dynamics (as well it shouldn't!). 

In the first order regime, it is known that if there is some adversarial strat- 
egy which constitutes evidence of domination (i.e. the weakly dominant strategy 
gives a greater payoff against it than the dominated one) and which always re- 
mains above a certain level, then the weakly dominated strategy becomes extinct 
(see e.g. Proposition 3.2 in Weibull, 1995). In our higher order setting, we show 
that this assumption instead implies that weakly dominated strategies become 
extinct as fast as strictly dominated ones: 

Proposition 4.5. Let x(t) be an interior solution of the n-th order replicator dynamics 
(RD n ), and let q k =<; qL If there exists a._ k 6 A_ k with u k (q k ;a_ k ) < u k (q' k ;a_ k ) and 
>£> Ofor all t > 0, then: 

D KL (<7fc|l*fcM) >e\ k [u k (q' h ;x_ k )-u k (q lc ;a- k )]t n /n\ + 0(t n - 1 ). (4.4) 

Remark 4.7. In the first order replicator dynamics, the elimination of weakly domi- 
nated strategies when the evidence of their domination survives still requires that 
all players adhere to the same dynamics (see e.g. the proof of Proposition 3.2 in 
Weibull, 1995). To wit, consider a simple Entry Deterrence game where a com- 
petitor (Player 1) "enters" or "stays out" of a market controlled by a monopolist 
(Player 2) who can either "fight" the entrant or "share" the market, and where 
"fighting" is a weakly dominated strategy that yields a strictly worse payoff if 
the competitor "enters" (Weibull, 1995, Ex. 5.4). Under the replicator dynamics, 
"fight" becomes extinct if "enter" survives (cf. Figure 2); however, if Player 1 were 
to follow a different process under which "enter" survives but the integral of its 
population share over time is bounded, then "fight" does not become extinct. In 
higher orders though, the proof of Theorem 4.4 goes through for any continuous 
play X- k (t) 6 X_jt, t > 0, of k's opponents, so weakly dominated strategies become 
extinct independently of how one's opponents evolve over time (rationally or otherwise). 

Remark 4.8. Finally, it is easy to show that Theorem 4.4 still holds if the play- 
ers' initial velocities (or higher order derivates) are nonzero but small;if they are 
too large, weakly dominated strategies may indeed survive. This observation is 
important for strategies which are only iteratively weakly dominated because, if a 
strategy becomes weakly dominated after removing a strictly dominated strategy, 
then the system's solutions could approach the face associated with the resulting 
subgame with a high velocity towards the newly weakly dominated strategy (e.g. 
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(a) Entry Deterrence (b) Outside Option 



FIGURE 2. Extinction of weakly dominated strategies and survival of it- 
eratively weakly dominated ones under the second order replicator dynamics. 
Fig. 2(a) shows solution orbits starting at rest in an Entry Deterrence game (see 
labels for the payoffs): the weakly dominated strategy "fight" of Player 2 becomes 
extinct, in stark contrast to the first order case (compare the highlighted light blue 
trajectory with the first order portrait in the inlay). On the other hand, Fig. 2(b) 
shows an Outside Option supergame where the strategy "fight" in Fig. 2(a) is 
only iteratively weakly dominated; as it happens, this strategy pays very well 
against certain initial conditions, so it ends up surviving when all evidence that it 
is (iteratively weakly) dominated vanishes. (In both figures, Nash equilibria have 
been highlighted in dark red.) 



if the iteratively weakly dominated strategy pays very well against the evanes- 
cent strictly dominated one; cf. Fig. 2). Thus, although Theorem 4.4 guarantees 
the elimination of weakly dominated strategies, it does not extend to iteratively 
weakly dominated ones. 



5. Stability of Nash Play and the Folk Theorem 

In games that cannot be solved by the successive elimination of dominated 
strategies, one usually tries to identify the game's Nash equilibria instead. Un- 
fortunately, given the prohibitive complexity of these solutions (Daskalakis, Gold- 
berg, and Papadimitriou, 2006), one of the driving questions of evolutionary game 
theory has been to explain how Nash play might emerge over time as the byprod- 
uct of a simpler, adaptive dynamic process. 

A key result along these lines is the folk theorem of evolutionary game theory 
(Hofbauer and Sigmund, 1988; Sandholm, 2011; Weibull, 1995), which, for the 
multi-population replicator dynamics (RDi), can be summarized as follows: 

I. Nash equilibria are stationary. 

II. If an interior solution orbit converges, its limit is Nash. 

III. If a point is Lyapunov stable, then it is also Nash. 

IV. A point is asymptotically stable if and only if it is a strict equilibrium. 

Accordingly, our aim in this section will be to derive an analogue of the folk 
theorem in the context of the higher order dynamics (RD n ). To that end however, 
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it is important to recall that the higher order playing field is fundamentally dif- 
ferent: as has been stated before, the choice of an initial strategy profile x(0) G X 
does not suffice to determine the evolution of (RD n ). Instead, one now needs to 
prescribe the full initial state co(0) — (x(0),x(0), . . . ) in the system's phase space 
O, including the initial velocity and other higher order derivative components 
(up to order n — 1). 

Surprisingly, despite these differences, the folk theorem of evolutionary game 
theory continues to hold almost verbatim in our higher order setting: 

Theorem 5.1. Let x(t) be a solution orbit of the n-th order replicator dynamics (RD„) 
for a normal form game = <&(N,A, u), and let q G X. Then: 

I. x(t) = qfor all t > if and only if ' q is a restricted equilibrium of<8. 
II. If x(0) G int(X) and lim x(t) = q, then q is a Nash equilibrium of 25. 

t— >oo 

III. If every neighborhood Uofq in X admits an interior orbit xjj(t) such that xjj(t) G 
U for all t > 0, then q is a Nash equilibrium of<3. 

IV. Let q be a strict equilibrium. Then, for every neighborhood U of q in X, there 
exists a neighborhood V of q in X and a neighborhood WofV \{q} in O such that 
x(t) G U and x(t) — >■ q for all initial states (x(0),x(0), . . . ) G W; conversely, only 
strict equilibria have this property. 

Moreover, as an immediate corollary of (IV), we also have: 

IV'. If q is a strict equilibrium of 0, then there exists a neighborhood U of q in X 
such that x{t) — > q whenever x(t) starts at rest in U (that is, x(0) G il and 
£(0) = ... = 0); conversely, only strict equilibria have this property. 

For the proof, see Appendix C; for now, the following remarks are in order: 

Remark 5.1 (Points vs. states and the standard folk theorem). A natural way to 
discuss the stability of initial points q G X is via the corresponding rest states 
(q, 0, . . . , 0) G fi (recall also the relevant discussion in Section 2.3 and the remarks 
following Theorem 4.4). With this in mind, we will say that q G X is stationary 
(resp. Lyapunov stable, resp. attracting) when the associated rest state (q,0,...,0) G 
O is itself stationary (resp. Lyapunov stable, resp. attracting). 

Given this duality between points q G X and rest states (q, 0, . . . ,0) G O, we 
may draw the following parallels between the folk theorem and Theorem 5.1: 

- Parts I and Ilof Theorem 5.1 are direct analogues of the corresponding first 
order claims; note however that (II) can now be inferred from (III). 

- Part Illis slightly stronger than the first order statement that Lyapunov sta- 
bility implies Nash equilibrium. Indeed, Lyapunov stability posits robustness 
with respect to open neighborhoods of initial conditions (including higher order 
components in higher order environments), whereas Theorem 5.1 only asks that 
every neighborhood of the point in question admit a trajectory which is wholly 
contained therein. As a matter of fact, there are equilibria that, even under the 
standard replicator dynamics, satisfy the latter property but not the former; 13 as 
such, Part III of Theorem 5.1 is closer to a "bare minimum" stability characteriza- 
tion of Nash equilibria, especially in higher orders. 



For instance, the equilibrium q = en + (e2\ + £22 ) /2 of the simple 2x2 game with payoff matrices 
[il = I and U2 = is not Lyapunov stable under the replicator dynamics and is not the a>-limit of 
any interior trajectory, but it still satisfies the property asserted in Theorem 5.1. 
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- Part IV on the other hand is not tantamount to asymptotic stability - it would 
be if W were a neighborhood of V instead of V\{^}. 14 In particular, Part IV' 
shows that strict equilibria attract all nearby rest states, 15 but not all nearby non- 
rest states: for every nearby point x £ X, we can find a neighborhood V x C Cl 
of initial states that converge to q, but there is no uniform bound on, say, initial 
velocities x(0) that ensures convergence to q. 

This difference between first and higher orders is inextricably tied to the sine 
qua non requirement that any higher order dynamical system needs first and 
foremost to stay in X. Specifically, for small (but finite) x, x, etc., the boundary 
term (3.16) in (RD n ) blows up to infinity as X/ C(V — > oo, so the dynamics in the 
phase space O will become perpendicular to X (viewed as a subset of O) near 
(q, 0, ... ,0), and this precludes asymptotic stability (see Fig. 3). 

That said, asymptotic stability is not only too strong a requirement for higher 
order settings, but it is also a not very relevant one: given that players do not 
have direct control over their initial velocities x(0), it makes little sense to ask 
for robustness against different "velocity choices" by the players. What players 
do control instead is the bias y(0),y(0), ... of their learning scheme as captured 
by the dynamics (LD n ); on that account, asymptotic stability should be phrased 
instead in terms of the players' initial bias in (LD n ), and not w.r.t. their initial 
"velocities" in (RD n ) - obviously, a redundant distinction for n — 1. 

To wit, if we assume w.l.o.g. that the strict equilibrium under scrutiny cor- 
responds to everyone playing their "0"-th strategy, the learning dynamics (LD n ) 
phrased in terms of the relative scores Zfc„ of (3.17) take the equivalent form: 

z k} = "fcfW ~ u k,o( x ), ( ZD n) 

with x = G*(z) given by the reduced Gibbs map (GM*). In this formulation, 
the strict equilibrium q = (0, . . .,0) corresponds to the point at negative infin- 
ity (—oo, . . . , — oo) and the proof of Theorem 5.1 (see Appendix C) shows that if 
players start at a neighborhood of (— oo, . . . , — oo) with learning bias Zj.^(0), . . . 
not exceeding some uniform M > 0, then the relative scores z^ escape to — oo. 
In other words, if we view the strict equilibria of © as points at infinity, then they are 
stable and attracting in the higher order learning dynamics (ZD,,). 

Remark 5.2. As in the first order case, Theorem 5.1 applies to more general classes 
of imitative dynamics. In the context of payoff-monotonic dynamics in particular, 
our proof goes through unchanged except for the converse implication of Part IV 
for n = 1 (see also Theorem 5.4 below). 

Now, with regards to the equilibration speed of the higher order dynamics, 
it can be shown that the rate of convergence to a strict equilibrium in the n-th order 
dynamics (RD„) is n orders as fast as in the first order case. Specifically, we have: 

Proposition 5.2. Let q = (e^o, ■ ■ ■ , £n,o) ^ e a strict Nash equilibrium of the finite game 
©, and let x(t) be a solution path of the replicator dynamics (RD„) which starts at rest 
and close enough to q. Then, there exists c > such that: 

x KQ (t) ~ 1 - exp ( - ct n /n\ + ©(f"" 1 )). (5.1) 



: We thank Josef Hofbauer for this remark. 

'Recall that V is canonically embedded in O via the rest map x 6 X i-> (x, 0, . . . , 0) £ fi. 
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Theorem 5.1 and Proposition 5.2 (see App. C for the proof) characterize the 
behavior of the n-th order replicator dynamics near strict equilibria from both a 
qualitative and a quantitative viewpoint; on the flip side, they do little to address 
mixed equilibria. To study this issue, recall first that the standard asymmet- 
ric replicator dynamics preserve a certain volume form in the interior of X, so 
mixed equilibria cannot be attracting in first order. Ritzberger and Weibull (1995) 
establish this "incompressibility" property of the replicator dynamics by taking 
an ingenious extrinsic reparametrization which makes the replicator dynamics 
divergence-free in the interior of X (see also Ritzberger and Vogelsberger, 1990). 
On the other hand, Hofbauer and Sigmund (1988) rely implicitly on the proper- 
ties of the Gibbs map (GM), and essentially show that the replicator dynamics 
are incompressible in the space of the score variables y^a ( see a l so Hofbauer, 
1996). In first order, this idea can be used to show that the generalized imitative 
dynamics (GD n ) are volume-preserving whenever the payoff observables Wfa do 
not depend on x^ a ; in higher orders however, the dynamics are further decou- 
pled because the W] {a are only tied to the players' mixed strategies and not their 
velocities. As a result, we obtain: 

Proposition 5.3. The flow of the generalized learning dynamics (GLD n ) is volume- 
preserving in the usual Euclidean geometry for all n > 2; the same holds for (GD n ) 
w.r.t. a non-Euclidean volume form on the system's phase space fl 

More importantly, by using Proposition 5.3, we can prove the stronger result: 

Theorem 5.4. In the generalized higher order dynamics (GD n ), n > 2, interior points 
cannot attract open sets of initial states; only vertices of X can be attracting. More 
generally, a non-pure point q £ X can only attract relatively open sets of initial states 
whose support in X properly contains that of q. 

It is important to stress that Theorem 5.4 clashes rather strongly with the first 
order case n = 1. For instance, if we take Maynard Smith's payoff -adjusted 
variant of the replicator dynamics (whereby players divide (RDj ) by their average 
payoffs), then interior equilibria may become asymptotically stable (for instance, 
as in the Matching Pennies example of Weibull, 1995). In higher orders however, 
this is no longer the case: the learning dynamics (LD n ) endow orbits with a 
tangential acceleration component, and this acceleration carries them away from 
interior equilibria and towards the boundary of X. 

This tangential component can be tempered by including friction terms of the 
form F = —y. In that case, the resulting dynamics cease to be incompressible and 
become dissipative, so trajectories may well converge to interior states; however, 
due to space limitations, we will not address this issue here. 

6. Concluding Remarks and Future Directions 

The results in the present paper suggest that higher order dynamics open the 
door to some intriguing new questions and directions in the study of learning 
and evolution in games. For one, the elimination of weakly dominated strategies 
is a key feature of higher order dynamics which puts them firmly apart from all 
their first order siblings; for another, the potential survival of iteratively weakly 
dominated strategies shows that the higher order landscape is also quite far from 
being a flat, always-predictable one. More so, even though the classes of higher 
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FIGURE 3. Second order replicator trajectories in a 2 x 2 coordination game 
with payoff matrices Ui = U2 = 1. The figure to the right shows the restriction of 
the game's phase space to the symmetric invariant manifold Xq which joins the 
game's equilibria. For every symmetric initial point x near q = (0, 0), there exists 
a neighborhood of initial states V x (gray) such that all orbits starting in V x stay 
close and eventually converge to q. The union W of these neighborhoods (light 
blue) is not itself a neighborhood of q in fig = fi(Xo), so q is not asymptotically 
stable in (RD n ); however, in terms of the score variables z = logitx, 2 = x/x(l — 
x), the corresponding point at infinity (—00, —00) is asymptotically stable in (LD n ) 
(inlay). 

order imitative dynamics considered here do not converge to interior equilibria, 
the very nature of higher order learning allows the introduction of friction terms 
which slow down trajectories and allow them to converge to interior states (a 
forthcoming result). In fact, we have only scratched the surface here: the higher 
order regime offers a extremely wide array of adjusted or altogether new classes 
of dynamics that simply cannot be obtained in a first order setting, and where 
the impossibility theorem of Hart and Mas-Colell (2003) no longer bars the way. 

We should also stress here that it is hard to underestimate the role that the 
Gibbs choice model (GM) plays in this higher order environment. In particular, 
the form and properties of the (game-independent) adjustment term (3.16) in 
the higher order replicator dynamics (RD n ) are direct consequences of the Gibbs 
model (GM), so one might ask what would happen with a different adjustment 
term, possibly unrelated to (GM). Surprisingly (and despite the fact that (RD n ) 
retains many of its rationality properties if the replicator term is replaced by 
some other payoff-monotonic term), if the adjustment term (3.16) is multiplied by 
something as innocuous as a real number a £ (0,1), then even strictly dominated 
strategies may survive (a forthcoming result). As such, the caution underlying our 
derivation of (RD n ) and (GD n ) seems to be well-justified: in the absence of a solid 
learning scheme, a naive incorporation of higher order effects a la (3.2b') may 
well lead to undesirable results. 

That said, even in our exponential learning framework, there still remain many 
open (and important) questions: For instance, when is this learning process 
consistent (e.g. externally, as in Sorin, 2009)? What can we expect in symmet- 
ric, single-population environments (where payoffs are no longer multilinear) 



HIGHER ORDER GAME DYNAMICS 



23 



or with respect to setwise solution concepts (such as sets that are closed un- 
der better replies as in Ritzberger and Weibull, 1995)? More generally, what 
can we expect if we move beyond a dynamical framework altogether and re- 
place the learning process (3.13) with a more general integral equation of the 
form Yfc a (f) = f <p(t — s)«/ Cfl ,(x(s)) ds where the "learning kernel" <p describes 
the weight that players assign to their past observations? The n-th order repli- 
cator dynamics correspond to choosing monomial learning kernels of the form 
cf)(t) = f", but how would learning be affected by e.g. an exponential discounting 
(or reinforcement) of the past? 

Finally, it is important to note that our approach has been focused on contin- 
uous time with players being able to observe (or otherwise calculate) the payoffs 
associated to mixed strategies (the last assumption being relatively harmless in 
a nonatomic population setting, but potentially important from a discrete point 
of view). This choice has been a conscious one and it was motivated by the fact 
that our goal was simply to illustrate the rationality properties of the limiting 
continuous-time dynamics; the subtleties (and there are many!) of the descent 
from the continuous to the discrete and from the population to the atom would 
take us too far afield, so we leave this issue as a future direction (the papers by 
Rustichini, 1999, and Sorin, 2009, may serve as an indication of what to expect in 
the discrete case). Needless to say however, these are all directions that would 
take much more than a single paper to explore, so we chose to exhibit here only 
some of the most prominent features and subtleties of the higher order landscape, 
deferring these questions to future investigations. 
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Appendix A. Derivation of Higher Order Dynamics 
The starting point for the derivation of (RD n ) is the identity 

logfoa) - log(*fc/0 = hiVka ~ 2/itjs)/ (A- 1 ) 

an easy consequence of (GM). Hence, by Faa di Bruno's higher order chain rule 
(Fraenkel, 1978), we readily obtain: 

£iog(* fa (o)=E w! ( " 1) '"" 1 i"" 1)! n" , MJw/rir, 

dt n h\ka\)) U mx \... mn \ x m 1 l r =l ^ U \>l J 

(A.2) 
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where we have set m = m\ + ■ ■ ■ + m n and the sum is taken over all non-negative 
integers m\, . . . , m n such that Y%=\ rm r — n - m particular, since the only term that 

(n) 

contains x k ' has tti\ — tn% — — m n _\ = and m n = 1, we may rewrite (A.2) 

as: 

j^logfotetf)) = X -^+R" k ; 1 (x(t),x(t) x^\t)), (A3) 

where R k ~ l denotes the (n — l)-th order remainder of the RHS of (A.2) and is 
given by (3.16). Then, by taking the n-th derivative of (A.l) and substituting, we 
get: 

(«) x ( n ) 

h (u ka - « ¥ ) = -M. _ JL + R n-t _ Rn-l, (A .4) 

\ / A ka AJ-^ 

and (RD n ) follows by applying x k p{-) on both sides (recall that jjp x k p = 

Appendix B. On the Elimination of Dominated Strategies 

Proof of Theorem 4.1. We will begin by showing that if q k 6 X k is dominated by 
q' k E X k , then D K i j (q k \\x k (t)) > ct n /n\ for some positive constant c > 0. In- 
deed, let Vjt(x) = Dkl(<?)c II x k) ~ DklWit II x k)> an d rewrite the Gibbs distribution 
(GM) as logx fca = \ k y koi - log(Z(y)) where Z(y) = Ejs ex P( A 0^) is the partition 
function of player A:. Then, some algebra yields: 

= E feiog(feAte) - E f*a lo § WkcL Afc«) 

a6supp((j) aSsupp(ij') 

= E« ("4 - fe) iog^« + h(%. q'k) 

= E a - + hiihi'k)' ( B -!) 

where h k (q k ,q' k ) is a constant depending only on ^ ^> and the last equality 

follows from the fact that £* (<?L - to) log Z = (recall that <?fa* = £« q' ka = !)• 
In this way, we readily obtain: 

^V k (x(t)) = A fc £* {q' ka -q kcl )y ( kc j = hYl Wk* ~ ?te)«fet(*(0) 

= A /c [«**(<?*; *-*(*)) - «*0?*;*-*(O)] > A fc A ";t > o> (B.2) 

where the constant Amj- is defined as Au k = minx _ k { u k{q' k > x -k) ~ u k(qk> x -k)} 
and its positivity follows from the compactness of X. Hence, if we set c 

(r!) 



r 

1 , r = . . . n — 1, Taylor's theorem with Lagrange remainder readily 



gives: 

V k (x(t)) > \ k Au k t"/nl + Y^Zl erf, (B.3) 

and our assertion follows by noting that D%x,(q k \\ X k (t)) > V k (x(t)). In particular, 
for pure strategies tx -< /3, we will have V k (x(t)) = log (x k p(t) / x ka (t)) , so (B.3) 
gives: 

log (x kfi (t)/x ka (t)) > X k Aup cc t n /n\ + Oit"- 1 ), (B.4) 
and (4.2) follows by exponentiating. 
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Now, to establish the theorem for iteratively dominated strategies, we will 
resort to induction on the rounds of elimination. To that end, assume that 
DKhitfk II x k{t)) — I 0(f) | for all strategies q k £ X[ that do not survive r elim- 
ination rounds; in particular, if a. i A' k = A k n X£, we assume that x ka {t) — » as 
f — >■ oo. We will show that this also holds if q k G X' k survives for r deletion rounds 
but dies in the subsequent one. 

Indeed, if q k G X[\X[ +1 , there will be some q' k G X[ with u k (q k ;x_ k ) > 
u k(tfk', x -k) ror ai l x -k G X-w With this in mind, decompose x G X as x = x' ' + z r 
where x r denotes the "r-rationalizable" part of X, i.e. the orthogonal projection of 
x on the subspace of X spanned by the surviving pure strategies A\, feK Then, 
if we set Au r k — mm.{u k (q k ; 0i_ k ) — u k (q k ;oc_ k ) : a_£ G A? ,}, we will also have by 
(multi)linearity: 

Mq'k; x ~k) - Mik; x -k) > Au l > for a ii x -k e x_ fc . (b.5) 

Moreover, it is easy to see that our induction hypothesis implies z r (t) — » as 
f oo (recall that x ka (t) — > for all a £ A r k ), so for large enough t, we also get: 

\u k {q' k ;z r _ k (t)) - u k (q k ;z r _ k (t))\ < Au' k /2. (B.6) 

Hence, by combining (B.5) and (B.6), we obtain u k (q' k ;x_ k (t)) — u k (q k ;x_ k (t)) > 
Au r k /2 for large t, and the induction is completed by plugging this last estimate 
into (B.2) and proceeding as in the base case r = (our earlier assertion). □ 

Proof of Theorem 4.3. The crucial point in the previous proof is the n-ih derivative 
of the entropy difference V k (x) = Dkl(^ || x k ) — Dkl(^ || x k ) which determines 
the rate of extinction of dominated strategies. Thus, by replacing u by w in (B.2) 
and using the appropriate monotonicity condition for each case of dominance 
(pure/mixed by pure/mixed), Theorem 4.3 follows by shadowing the proof of 
Theorem 4.1. □ 

Proof of Theorem 4.4. Let q k =4 q' k and let A'_ k = G A^ k : u k (q' k ;&_ k ) > 

u k (q k ; oi_ k )} be the set of pure strategy profiles of k's opponents against which 
q' k yields a strictly greater payoff than q k . Then, with notation as above, we will 
have: 

d 11 

jp-Vjfc(x(t)) = x kJ2 a _ k eA'_ k [Mlk'K-k) ~ u k (q k ;a_ k )]x a _ k (t), (B.7) 

where x a _ k = W^ k x ae denotes the oc_ k -th component of x. Thus, with x(t 

d'V 

starting at rest, Faa di Bruno's formula gives 
and a simple integration then yields: 

d n ~ x [ l 

;^=rVjfc(*(0) = hJ2a_ k eA'_ k - «*(?*;*-*)] J Q x «- k ( s ) ds > 

(B.8) 

However, with x(t) interior, the integrals in the above equation will be positive 
and increasing, so for some suitably chosen c > and f large enough, we obtain 

^V k {x{t)) > A fc c > 0, (B.9) 

and our claim follows from a (n — l)-fold application of the mean value theorem. 

□ 



= for all r = 1, . . . , n — 1, 

f=0 
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Appendix C. Convergence and Non-Convergence Results 

Proof of Theorem 5.1. We will begin with stationarity of restricted equilibria. In- 
deed, since the payoff term of (RD n ) does not contain any higher order deriva- 
tives, it will vanish at q £ X if and only if u ka (q) = u k (q) for all a £ supp(^), 
implying that q is a restricted equilibrium. Conversely, let q be a Nash equi- 
librium in the subgame 0' = ©(N, X', m^Ix') with A' k = supp(^j-). Then, with 
Ukaiq) — M lt/3(<7) ror all a /|6 £ A' k , the updating scheme (LD n ) constrained to ©' 
and starting at q also gives y k J (0) = (0) for all oc, (5 £ So, if (RD n ) starts at 
q with initial motion rates x(0) = x(0) = ■ ■ ■ = 0, we will have l/jfc a (f) — J//t a (0) = 
Vkp{t) ~ !/)c^(0) ror a H G ^j-/ and, by the homogeneity of the Gibbs map 
(G(i/i + c,yi + c, . . . ) = G{\j\,\)i, ■ ■ ■ ) for all c £ ]R), we readily obtain x(f) = ^ 
for all t, i.e. ^ is stationary. 16 

We now turn to Part (III) of the theorem - which will also prove Part (II). To that 
end, suppose that every neighborhood U of q in X admits an interior orbit x(t) 
that stays in U for all f > 0; we then claim that q is Nash. Indeed, assume instead 
that for some k £ N, there exists /3 £ and a £ supp^/t) with u^ a (q) < u^{q). 
Then, pick £ > and a neighborhood U oi q such that xj- fl > q^/2 > and 
Ukp(x) > M/caM + e f° r all x £ U. By assumption, there exists an interior orbit 
x(t) which stays in U for all time, so, for the associated score variables y(t), we 
will have: 

y*J?( f ) - y&?(0 = »V*(0) - "*»(*(*)) > £ > o. 

This last inequality immediately implies that log (xkfi(t) / x ku(t)) — > +co, contra- 
dicting the fact that x ka {t) > q^/2 for all £ > 0. 

With regards to Part (IV), let q = (e\ o, . . . ,ejv o) be a strict equilibrium of &, 
and consider the relative scores zh, = y/tu — Vk,0' F ^ -^l — At It is then easy 
to see that the reduced Gibbs map G k : M. A k — > X k of (GM*) is a diffeomorphism 
onto its image, so the same will hold for the direct sum G* = 0j. G k : M. A — > 
X as well. Accordingly, if we take a neighborhood U £ of q in X of the form 
il £ = {x £ int(X) : x^g > 1 — e, k £ 3\f}, its preimage under G* will be the 
set V h = {z £ R^* : Z kJ0 < h, k e where ft = (1 - e) _1 - 1 (« e for small 
e). We will show that if ft is chosen small enough, then there exists 5 > such 
that whenever a solution z(f) of (ZD n ) starts at z(0) £ V £ with ||z( r )(0)|| < 6 for 
r = 1, . . . , n — 1, we have z(f ) £ V 2 /i f° r all f > and (f) — > — oo for all ^ £ A^, 
A: £ ZNT. Since G* is a diffeomorphism onto its image and x — > q iff z^„ — >■ — oo for 
all ^ £ At, k £ N, this will establish the "if" direction of our claim. 17 

Indeed, let z(f) be a solution of (ZD n ) starting in Vj, and let = inf{£ : 
z(t) i V 2 j l } be the time it takes z(f) to escape from V lh (with the usual convention 
inf (0) = 00). Then, if ft is taken small enough, there will be some constant M > 
such that Ufrjoix) — u k „(x) > M > for all x £ G*(V^,) (recall that ^ is a strict 
equilibrium), so for t < T2/„ Taylor's theorem with Lagrange remainder applied 



It is important to note here that Nash equilibria are not stationary in (LD n ): trajectories that are 
stationary in X may track a line parallel to the vector (1, . . . , 1) in M. A . 

1 n 

Non-interior trajectories can be handled similarly by looking at the appropriate subgame. 
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to (ZD n ) gives: 

-n-1 Jr) 



z kjt (t)<z ¥ (0)+^z^(0)f/r\-Mt n /nl (C.l) 



Hence, pick 5 > such that the maximum of the polynomial 4? (°) f 7 r! 



Mt n /n\ for f > is strictly smaller than log2 whenever |z[^ (0)| < 5 for }i G .A£, 
6 N, and r = 1, ...n — 1. This readily yields ^^(t^) < ^ exp (z^,(0) + 
log 2) = 2Zjy)(0) < 2/z, i.e. z(t 2 /,) G V^/,, a conclusion which cannot hold un- 
less Tzh = oo. We thus obtain z(£) G V 2 ;, for all t > 0, so the limit of (C.l) as 
t — » oo gives z^u(t) — >■ —oo. Then, by using Taylor expansions of a lower order, 
one can show in a similar fashion that the same also holds for the derivatives 
Zj^ (£), . . . , z£" ^ (f ), as was to be shown. 

For the converse implication, it is easy to show that any vertex q of X which 
attracts an open neighborhood of initial rest states must also be a strict Nash 
equilibrium: extending the reasoning of Ritzberger and Weibull (1995, Thm. 1) to 
our higher order setting, it suffices to consider the evolution of the dynamics in 
the edge which joins q = (ajt;a_jt) to a vertex q' = (al;a_jt) with u^{q') > u^{q). 
However, Theorem 5.4 shows that only a vertex q G X can attract an open set of 
initial states co G fi which contains a punctured neighborhood of q in X, so our 
assertion follows. □ 

Proof of Proposition 5.2. By Theorem 5.1, we know that if x(t) starts close enough 
to q and is initially at rest, then it will always remain close to q. As such, by choos- 
ing a sufficiently small neighborhood of initial positions, the payoff differences 
M (c,o( x (0) — u k,u( x (t)) wm be bounded away from by some positive constant 

c > for all p G A\, k G N and for all t > 0, so (ZD n ) gives z^J < -c < 
as well, and our assertion follows from an (n — l)-fold application of the mean 
value theorem. □ 

Proof of Proposition 5.3. Similarly to (3.14), (GLD n ) can be written as: 

#te 1 (0 = «'te(*(0)/ 

(C.2) 

so, given that yL does not appear in the equation for \j\ , it follows that the flow 
of (GLD n ) is incompressible in the standard Euclidean metric of M. A . 1S Using 
the relative scores z, the same argument applies to the dynamics (ZD n ) with u 
replaced by w, and since G* is a local diffeomorphism, the result carries over to 
(GD n ) as well. □ 

Proof of Theorem 5.4. We will prove that if q G int(X), then there is no open set of 
initial conditions in O that converges to q. The result for general non-pure q G X 
will then follow by focusing on the face X' of X which is spanned by the support 
of q, i.e. X' = fh with A' k = supp^); since the dynamics (GD n ) preserve 



1 Q 

This remains true for n = 1 whenever diu^/dx^ = 0, i.e. for the replicator dynamics (or, more 
generally, if div w = 0). 
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the faces of X, the assertion follows by noting that the intersection of X' with an 
open set in X is open in X' by definition. 

Working with the variables z of (3.17) and recalling that the map G* : z i— >■ x is a 
local diffeomorphism, Proposition 5.3 shows that open sets of initial states in Ci w , 
the phase space of the dynamics (ZD n ) with u replaced by w, cannot converge to 
the interior state ((G*) _1 {q),0, ... ,0). Thus, to show that z(£) cannot converge to 
the interior point z* = (G*)^ 1 (q), it suffices to show that z(f) — > z* would also 
imply lim z(t) = lim z(t) = • • • = 0. 

r J M-oo f^oo 

For notational simplicity we will only prove the case n — 2. To that end, 
assume ad absurdum that Zty(t) — > zjL but z^(f) -» for some fi £ A^k G X 
Then, without loss of generality, there exists e > and an increasing sequence 
of times t n — > oo such that > £ for all n. Thus, let /„ be the largest open 

interval which contains t n and which is such that z^ > e/2 in J„. Then, the 
measure S n = m(J n ) of /„ must vanish as n — > oo; otherwise, and by passing to 
a subsequence of t n if necessary, S„ would always exceed some positive S > 0, 
implying that Zfc„(f) grows by at least eS/2 in J„ for all n, a contradiction (recall 
that Zfy,(f) — >■ so all subsequences of z^{t n ) are Cauchy). Thus, given that 
Zkfi {in) > £ by assumption, the mean value theorem reveals that there exists some 
In e Jn with z kfl (£„) > e/2Sl. However, since z^ must also be a rest point of 
(ZD n ), the dynamics (ZD n ) give z^{i) — > as t — > oo, a contradiction which 
proves our claim. □ 
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